POMDP planning and execution in an augmented space
نویسندگان
چکیده
In planning with partially observable Markov decision processes, pre-compiled policies are often represented as finite state controllers or sets of alpha-vectors, which provide a lower bound on the value of the optimal policy. Some algorithms (e.g., HSVI2, SARSOP, GapMin) also compute an upper bound to guide the search and to offer performance guarantees, but they do not derive a policy from this upper bound due to computational reasons. The execution of a policy derived from an upper bound requires a one step lookahead simulation to determine the next best action and the evaluation of the upper bound at the reachable beliefs is complicated and costly (i.e., linear programming or sawtoooth approximation). The first aim of this paper is to show principled and computationally cheap ways of executing upper bound policies which can be even faster than executing lower bound policies based on alpha vectors. The second complementary contribution is a new method to find better upper bound policies that outperforms those obtained by existing algorithms, such as HSVI2, SARSOP, or GapMin, on a suite of benchmarks. Our approach is based on a novel synthesis of augmented and deterministic POMDPs and it facilitates efficient optimization of upper bound policies.
منابع مشابه
Flexible POMDP Framework for Human-Robot Cooperation in Escort Tasks
We describe a novel method for ensuring cooperation between human and robot. First, we present a flexible and hierarchical framework based on POMDPs. Second, we introduce a set of cooperative states within the state-space of the POMDP. Third, for ensuring an efficient scalability, the framework partitions the overall task into independent planning modules. Lastly, for a robust execution of the ...
متن کاملMonitoring plan execution in partially observable stochastic worlds
This thesis presents two novel algorithms for monitoring plan execution in stochastic partially observable environments. The problems can be naturally formulated as partially-observable Markov decision processes (POMDPs). Exact solutions of POMDP problems are difficult to find due to the computational complexity, so many approximate solutions are proposed instead. These POMDP solvers tend to ge...
متن کاملCovering Number for Efficient Heuristic-based POMDP Planning
The difficulty of POMDP planning depends on the size of the search space involved. Heuristics are often used to reduce the search space size and improve computational efficiency; however, there are few theoretical bounds on their effectiveness. In this paper, we use the covering number to characterize the size of the search space reachable under heuristics and connect the complexity of POMDP pl...
متن کاملPlanning to see: A hierarchical approach to planning visual actions on a robot using POMDPs
Flexible, general-purpose robots need to autonomously tailor their sensing and information processing to the task at hand. We pose this challenge as the task of planning under uncertainty. In our domain, the goal is to plan a sequence of visual operators to apply on regions of interest (ROIs) in images of a scene, so that a human and a robot can jointly manipulate and converse about objects on ...
متن کاملPoint-Based Policy Transformation: Adapting Policy to Changing POMDP Models
Motion planning under uncertainty that can efficiently take into account changes in the environment is critical for robots to operate reliably in our living spaces. Partially Observable Markov Decision Process (POMDP) provides a systematic and general framework for motion planning under uncertainty. Point-based POMDP has advanced POMDP planning tremendously over the past few years, enabling POM...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014